Textual Article Clustering in Newspaper Pages
نویسندگان
چکیده
In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experiment on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as the preferred choice to solve the textual clustering problem.
منابع مشابه
Clustering in Newspaper Pages
In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experimentation on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as th...
متن کاملAn Architecture for Efficient News Items Clustering and Retrieval Based on Language Models for a Dynamic Collection of E- Newspapers
Newspaper pages comprises of multiple individual articles divided into multiple columns. The challenging part of this task is to organize and integrate article blocks in the newspaper. This paper proposes a novel approach for Article reconstruction from newspapersincluding an aggregation of multiple sections of article and reading order recovery of each individual article.Thus,the process combi...
متن کاملLinking article parts for the creation of newspaper digital library
An important issue pertaining to the retro-conversion of newspapers, i.e. the conversion of newspaper issues into digital resources, is the identification and appropriate digital representation of an article. To complete this task, a number of steps have to be followed, from segmentation of the newspaper image to optical character recognition and linking of different items belonging to the same...
متن کاملWeb pages, text types, and linguistic features: Some issues
1 Introduction With the growth of the Web a massive quantity of documents, namely web pages, are freely available for (corpus-)linguistic studies. Web pages can be considered as a new kind of document, much more unpredictable and individualized than paper documents. While the linear organization of most paper documents is still reflected in traditional electronic corpora, such as the British Na...
متن کاملMetadiscourse Markers: A Contrastive Study of Translated and Non-Translated Persuasive Texts
Metadiscourse features are those facets of a text, which make the organization of the text explicit, provide information about the writer's attitude toward the text content, and engage the reader in the interaction. This study interpreted metadiscourse markers in translated and non-translated persuasive texts. To this end, the researcher chose the translated versions of one of the leading newsp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Applied Artificial Intelligence
دوره 20 شماره
صفحات -
تاریخ انتشار 2006